This package takes in data from bulk-Isoseq processed with and without cycloheximide (CHX), a nonsense mediated decay inhibitor (NMDi), and identifies isoforms and genes that are differentially expressed and/or undergoing NMD.
Prior to using this package, the following data processing needs to be performed on bulk Isoseq data:Please view the Complete_workflow_example2.sh file to see a step-by-step usage of this package with dummy data.
The package outputs these following results based on hypothesis testing with chi-square tests within samples and between samples:
Please note that hypothesis 4 is deprecated and is purposely skipped.
If we have an isoform/gene, and the isoform/gene contains a splicing variant that causes NMD, we should see that the isoform/gene has fewer mRNA transcripts because they should be degraded by NMD. If we add an NMDi like CHX, we stabilize the mRNA and we see more reads in the CHX treated sample.
For example, isoforms in HARS1 is known to undergo alternative splicing which results in NMD in UDN212054 As a result, treatment with CHX causes HARS1 isoforms to show up as significant under hypothesis 1. However, EDN1 was not previously known to harbor variants that result in NMD in this patient, so EDN1 is now a potential candidate for evaluating the genetic cause.
| Isoform_PBid | Sample | associated_gene | P_Value_Hyp1 | Cyclo_TPM | Noncyclo_TPM | NormalizedFractionDifference | Phenotypes | PhenotypesNotEmpty |
|---|---|---|---|---|---|---|---|---|
| PB.33489.130 | UDN212054 | EDN1 | 0.0443832 | 1.074748 | 0.000000 | 1.0000000 | {High density lipoprotein cholesterol level QTL 7} (3); Question mark ears, isolated, 612798 (3), Autosomal dominant; Auriculocondylar syndrome 3, 615706 (3), Autosomal recessive | TRUE |
| PB.33489.108 | UDN212054 | EDN1 | 0.0000000 | 834.004603 | 9.369151 | 0.9777817 | {High density lipoprotein cholesterol level QTL 7} (3); Question mark ears, isolated, 612798 (3), Autosomal dominant; Auriculocondylar syndrome 3, 615706 (3), Autosomal recessive | TRUE |
| PB.31662.153 | UDN212054 | HARS1 | 0.0000000 | 13.255228 | 0.000000 | 1.0000000 | Charcot-Marie-Tooth disease, axonal, type 2W, 616625 (3), Autosomal dominant; Usher syndrome type 3B, 614504 (3), Autosomal recessive | TRUE |
For hypothesis 2, we are only looking at the reads from the noncyclo samples.
You might be thinking, why are we even looking at noncyclo reads in isolation. Don’t we need the cyclo reads to say anything about NMD? And you’re totally right, we would need the cyclo reads to study NMD. However, NMD isn’t the only way transcripts can be affected by genetic variants. Transcripts can also just have lower expression due to regulatory variants for example. So we wanted a hypothesis that can also allow us to find transcripts that have lower or higher expression due to things other than NMD. Which is why we came up with hypothesis 2, which allows us to find isoforms that have lower/higher expression regardless of NMD.
Here are specifics regarding the statisical testing using a chi-square 2x2 table:For example, VTA1 is known to have decreased expression in UDN687128. Furthermore, the decrease in expression is allele-specific and not related to NMD.
| Sample | associated_gene | Max_P_Value_Hyp2_below_median | Cyclo_TPM | Noncyclo_TPM | Noncyclo_Z_Score | Phenotypes | PhenotypesNotEmpty |
|---|---|---|---|---|---|---|---|
| UDN687128 | VTA1 | 0 | 97.95027 | 81.20662 | -2.538116 | FALSE |
Hypothesis 3 is very similar to hypothesis 2, but in hypothesis 3, we are using just the cyclo samples. Hypothesis 3 allows us to pick out isoforms/genes that are differentially expressed when treated with cycloheximide.
Here are specifics regarding the statisical testing using a chi-square 2x2 table:For example, SET isoforms are known to have splicing variants that result in NMD in UDN215640. UDN215640 also has SET isoforms with splicing variants that do not undergo NMD, as a result, you can notice that there are some isoforms that are highlighted in SET where the cyclo TPM is not greater than the noncyclo TPM. While these isoforms are shown here as being captured by hypothesis 3, they are also captured by hypothesis 2.
| Isoform_PBid | Sample | associated_gene | Max_P_Value_Hyp3_above_median | Cyclo_TPM | Noncyclo_TPM | Cyclo_Z_Score | Phenotypes | PhenotypesNotEmpty |
|---|---|---|---|---|---|---|---|---|
| PB.51843.120 | UDN215640 | SET | 0.0000000 | 18.292148 | 27.6827805 | 3.327809 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.135 | UDN215640 | SET | 0.0009002 | 2.918960 | 5.9320244 | 3.315860 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.335 | UDN215640 | SET | 0.0238786 | 1.556779 | 0.8987916 | 3.312667 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.116 | UDN215640 | SET | 0.0000000 | 11.870436 | 14.2009069 | 3.326596 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.58 | UDN215640 | SET | 0.0005698 | 3.113557 | 2.8761330 | 3.328201 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.67 | UDN215640 | SET | 0.0000378 | 4.281141 | 7.9093659 | 3.328201 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.132 | UDN215640 | SET | 0.0388493 | 1.362181 | 0.0000000 | 3.328201 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
| PB.51843.355 | UDN215640 | SET | 0.0147725 | 1.751376 | 1.9773415 | 3.328201 | Intellectual developmental disorder, autosomal dominant 58, 618106 (3), Autosomal dominant | TRUE |
Instead of looking at isoforms individually like we did in hypothesis 1, we wanted to have a method of grouping novel isoforms together. This way, we increase the statistical power to determine whether novel isoforms in a gene is undergoing NMD. We make the assumption that isoforms that have low/no abundance in the noncyclo sample are isoforms that are “novel” and likely ones that underwent NMD. We want to group these together into a single bin so that we have more power for statistical testing.
Here is a diagram illustrating the binning process:For example, MFN2 is known to have several isoforms that have alternative splicing and results in NMD. Individually, these novel isoforms have low read counts. However, when combined together at the gene level, we are able to pick out MFN2 as a gene containing novel isoforms that undergo NMD.
| Sample | associated_gene | P_Value_Hyp5 | proportion_in_Bin1_cyclo | proportion_in_Bin1_noncyclo | Cyclo_TPM | Noncyclo_TPM | NormalizedFractionDifference | Phenotypes | P_Value_Hyp1 | PhenotypesNotEmpty |
|---|---|---|---|---|---|---|---|---|---|---|
| UDN633333 | MFN2 | 0 | 0.4 | 0 | 25.17233 | 24.84468 | 0.0065509 | Lipomatosis, multiple symmetric, with or without peripheral neuropathy, 151800 (3), Autosomal recessive; Charcot-Marie-Tooth disease, axonal, type 2A2A, 609260 (3), Autosomal dominant; Charcot-Marie-Tooth disease, axonal, type 2A2B, 617087 (3), Autosomal recessive; Hereditary motor and sensory neuropathy VIA, 601152 (3), Autosomal dominant | 0.9683644 | TRUE |
To validate the effects of NMD, PCA was performed on the gene-level (left) and isoform-level (right) expression matrix for the 13 patients (26 samples) included in the analysis.
This package is developed during my time at the University of Washington as a Genome Sciences graduate student in the Stergachis Lab. Acknowledgements to:
Andrew Stergachis
Adriana Sedeno Cortes
More to be added soon